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Abstract 

This paper presents a unified matiiematical framework for inference in graphical models, building on 
the observation that graphical models are algebraic varieties. From this geometric viewpoint, observa- 
tions generated from a model are coordinates of a point in the variety, and the sum-product algorithm is 
an efficient tool for evaluating specific coordinates. The question addressed here is how the solutions to 
various inference problems depend on the model parameters. The proposed answer is expressed in terms 
of tropical algebraic geometry. A key role is played by the Newton polytope of a statistical model. Our 
results are applied to the hidden Markov model and to the general Markov model on a binary tree. 

1 Algebraic Statistics, Tropical Geometry, and Inference 

This paper presents a unified mathematical framework for probabilistic inference with statistical models, 
such as graphical models. Our approach is summarized as follows: 

(a) Statistical models are algebraic varieties. 

(b) Every algebraic variety can be tropicalized. 

(c) Tropicalized statistical models are fundamental for parametric inference. 

By a statistical model we mean a family of joint probability distributions for a collection of discrete 
random variables Y = {Yi, . . . ,Y„}. Thesis (a) states that many families of interest can be characterized by 
polynomials in the joint probabilities Par-c„ = Prob(Fi = ai, . . . ,y„ = a„). The emerging field of algebraic 
statistics fl^[T^ offers algorithms for this polynomial representation. 

Tropicalization means replacing the arithmetic operations (+, x) by the operations (min,+). This 
process captures the essence of what happens when the joint probabilities Pai -a„ replaced by their 
logarithms. The tropicalization of an algebraic variety is a piecewise-Unear set which enjoys many fea- 
tures familiar from algebraic geometry (SJEl- Iri particular, the tropicahzation of a statistical model is a 
piecewise-linear set in the space with logarithmic coordinates — log(/7(ji ■ a,,)- 

Thesis (c) states that tropical algebraic geometry of statistical models is fundamental in analyzing the 
behavior of inference algorithms under the variation of model parameters. By inference we mean the eval- 
uation of one or more coordinates of a single point on the algebraic variety, in either (+, x) or (min,+) 
arithmetic. This is the standard notion of inference used for graphical models in statistical learning theory 
|[T51 . but it differs from other (more classical) notions of inference in mathematical statistics. By parametric 
inference we mean the analysis of the dependence of inference on parameters. 

To give a more concrete discussion of parametric inference it is useful to focus on directed graphical 
models. A directed graphical model (or Bayesian network) is a finite directed acyclic graph G with two 
kinds of vertices, observed variables Y = {Fi , . . . , y„} and hidden variables X = {Xi , . . . where each 
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edge is labeled by a transition matrix whose entries are linear- forms in some parameters. The rules of 
discrete probability express the observed probabilities Pci -c„ as polynomials of degree E in the parameters, 
where E is the number of edges of G. The polynomials parametrize the graphical model as an algebraic 
variety. 

The two standard types of inference questions for graphical models are: 

1 . the calculation of marginal probabilities: 

P(Si-a„ = ^ Proh{Xi= hi,..., X,„ = h,n,Yi=ai,...,Yn= On), 

/!!,... ,/l,„ 

2. the calculation of maximum a posteriori (MAP) log probabilities: 

5(ji-a„ = min -log(Prob(Xi =/ji,...,X,„ = /j,„,yi =ai,...,y„ =a„)), 

where the hi range over all the possible assignments for the hidden random variables X,-. Together, these two 
primitives can be used to effectively solve a range of other inference problems, including the calculation 
of conditional probabilities and other quantities of interest. The key to inference in graphical models is 
the sum-product algorithm |[T4l (also known as the generalized distributive law 13). This polynomial-time 
algorithm is used, both in ordinary arithmetic (+, x) and in tropical arithmetic (min,+), to efficiently solve 
Problems 1 and 2. For more background on the sum-product algorithm, and for connections to message 
passing and the junction tree algorithm see lITSl . 

Although the sum-product algorithm provides efficient solutions to the basic inference problems 1 and 
2, it only applies to one coordinate /?a, ...a„ of one distribution at a time. What we are interested in ai^e the 
parametric versions of the inference problems. They can be phrased as follows: 

3. Find all parameters for a model which result in the same values for all Pau -.o,,- 

4. Given observations Y = a and hidden data X = h, identify all parameters such that h is the most likely 
explanation for the observations a. 

As we will see, the following modeling questions are fundamentally related to Problems 3 and 4: 

5. Which (parameter independent) relations on the probabilities Pay a„ does the model imply? 

6. Describe the tropicalization of the variety corresponding to a graphical model. 

Problem 5 asks for the ideal of polynomial invariants of a statistical model ifT^ . Invariants have been 
investigated in phylogenetics |2l|5l where they can help to identify good trees for aligned DNA sequences. 

The primary goal of our study is to give a practical answer to question 4 for graphical models. Our main 
algorithmic result is an efficient procedure for parametric inference that can be viewed as a polytopal analog 
of the sum-product algorithm. The efficiency is based on the complexity estimates for Newton polytopes 
which we derive in Section 4. The resulting polytope propagation algorithm is applied to problems in 
biological sequence analysis in the companion paper fTSl . 

The mathematics to be developed in Sections 3 and 4 is of independent interest. It also furnishes new 
tools for parametric inference (Problems 3 and 4) and parametric modeling (Problems 5 and 6) which are 
applicable to a wide range of statistical problems. We demonstrate this by analyzing the hidden Markov 
model (HMM) and the general Mai^kov model on a binary tree, in Sections 2 and 5 respectively. 
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2 Algebraic Representation of Hidden Markov Models 

A graphical model is an algebraic variety which is presented as the image of a highly structured polynomial 
map / : R'^ ^ R'". Here R"^' is the space whose coordinates are the model parameters si,... ,5^1 and R"' is 
the space whose coordinates pc = Pai -c„ are the joint probabilities for the observed random variables. In 
applications, the integer m is much larger than the integer d, in fact; it is so large that one can only look at 
one coordinate pa at a time. Each coordinate /o = fd^i, ■.■,Sd) of the map / is a polynomial function in 
si,... ,Sd. The efficient evaluation of these functions relies on the sum-product algorithm. Here we study the 
(parametric) inference and modeling problems in the familiar context of the hidden Markov model (HMM). 

A discrete HMM has n observed states Fi , . . . , F„ taking on I possible values, and n hidden states 
Xi,...,Xn taking on k possible values. The HMM can be characterized by the following conditional in- 
dependence statements for / = I,... ,n: 

p{Xi\Xi,X2,...,Xi^i) = piXi\Xi^i), 
p{Yi\Xu...,Xi,Yu---,Yi-i) = PiYi\Xi). 

We consider the homogeneous model with uniform initial distribution, where all transitions Xj are 
given by the same k x ^-matrix S = {sij) and all transitions X,- — > Yj are given by the same k x /-matrix 
T = (tij). Throughout our discussion we disregai^d for simplicity the usual probabilistic hypothesis that S 
and T are non-negative and all row sums are 1 . 

Proposition 1. The hidden Markov model is the image of a map / : R'' — > R' ", where d = k{k + /) and each 
coordinate of f is a bi-homogeneous polynomial of degree n—\ in S and degree n in T. 

Problem 3 is to compute the fibers of the map /. In statistics, this is called parameter identification. We 
use the term coordinate polynomials for the polynomials /o that are coordinates of the map /. 

Our running example in this section is the case n = 3 with binary random variables {k = 1 = 2). The 
graph of this model is drawn in Figure Q The shaded nodes are the observed random variables. 







Figure 1 : The hidden Markov model of length three. 

Here the parameter space is R^ with coordinates soQ,so\,sw^s\\,toQ,tm,t\Q^t\\, and it maps to R^ with 
coordinates Pooo, Pool, ^'oio,^'oii,^'ioo,Pioi,Piio, Pin- The map / : R*^ ^ R^ is given by 



/cia203 — ■^'oo^OOfOci^OoT^Ooj + ■^'oO'^'Ol^Ooi ^Oai^loa + SmS\otQaih<3iJf)Ci + SmSutoaihaihai 
S\oSQot\aitQa2k)ai + ■5lO'?Olfl0i?Oa2^1O3 + ■^l I'5l0?lai?la2^0a3 + ■^l I'^l I?lai?la2^1a3 • 
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The hidden Markov model (i.e. the image of /) is the zero set of the quartic polynomial 

PonPioo~ PmP^w^ PimPa nP^oi ~ PooopjoiPiw + PoooPo iipim~ PooiPoioPm + PmPmPm 

2 2 2 

+/'oio^iooPiii — T^ooiPioo/'iii —/'oooPoii^'iio — /'ooiPoiiPiooPioi — /'OlOPOllPlOOPlOl 
+A)oiA)io7'oii7'iio — Poio/'oii/'ioo/^iio + PooiA)io7'ioi7'iio + /'ooi7^ioo7'ioi7'iio + PoooA)ioA)iiPiii 
— T'ooo/'oii/'ioo/'iii — T'ooo/'ooi/'ioi/'iii +7'ooo7'ioo/'ioi7'iii +7'ooo7'ooi7'iio7'iii — PoooPomPimPwi- 

This polynomial was found by a Grobner basis computation. See the discussion on implicitization in 
§3]. 

In general, the polynomial functions on R'" which vanish on the image of / are the called invariants of 
the model. They form a prime ideal If. In our example, If is generated by the quartic polynomial above. 
Problem 5 is to compute generators of the ideal If. When /" and d are small, this can be done using Grobner 
bases, and in some cases it is possible to characterize If based on the structure of the model (see, for example. 
Conjecture fOl. but in general Problem 5 is hard and the ideal If may remain unknown. 

Here is where tropical geometry comes in. The tropicalization of our map / is the map g : R"^' — > R'" 
defined by replacing products by sums and sums by minima in the formula for /. In our example {n = 3,k = 
I = 2), the tropicahzation is the piecewise-linear map g : R^ ^ R^, {U ^V) i— > 5 with 

5oi0203 = min{M/,i/,2+Mft2/i3+v/,i(ji+Vfctj2+v/,3(j3 : (/zi,/i2,/i3) G {0, 1}^}. (1) 

This minimum is attained by the most likely hidden data {h\,h2,h-i), given the observations (01,02,03) and 
given the parameters u.. = — log(5..) and v.. = — log(f..). The sequence {h\,h2,h-i) is known as the Viterbi 
sequence in the HMM literature ll20ll . It solves Problem 2 in the Introduction. 

The key observation, which we discuss in more detail in Section 4, is that the set of parameters {U ^V) 
which select the Viterbi sequence (/ii ,/j2,/i3) is the normal cone at a vertex of the Newton polytope of the 
polynomial foioja^. This polytope is 4-dimensional, it has 8 vertices, and its normal fan represents the 
solution to Problem 4 in the Introduction when a = 0\020j, is fixed. 

We can also consider an extension of Problem 4 where a = 0\02^?, ranges over all possible observations. 
The solution is given by the Newton polytope of the map /. In our example, this is a 5 -dimensional polytope 
with 398 vertices, 1136 edges, 1150 two-faces, 478 thi^ee-faces and 68 facets, namely, the Minkowski sum 
of eight copies of the earlier 4-dimensional polytope for (01,02,03) G {0,1}^. For a concrete numerical 
example, fix the parameters U* = (g \) and V* = (° g). We find: 

if the observed string at ^1 72^3 is 010203 = 000 001 010 Oil 100 101 110 111 
then the Viterbi sequence at X1X2X3 is ^1^2/13 = 000 001 000 Oil 000 111 110 111 

The set of all parameters {U ^V) leading to the same conclusions as {U* ,V*) is the cone defined by 

"01 -"00 + Vll - Vol < 0, MlO-"ll +V00-V10 < 0, Moo + Vol -"10-vii < 0, 

2moo + voi -Moi -"10-vii < 0, 2ui\ +V10 + V11 -um-um -voo-voi < 0. 

Our solution to the parametric inference problem with respect to all observations simultaneously consists 
of 398 such cones. The tropical HMM is the union of the images of these cones under the piecewise-linear 
map g : {U ,V) ^ 5. This image is a piecewise-linear set of dimension 7. The cone which contains the 
chosen parameters {U* ^V*) mapped to a 7-dimensional cone in the tropical HMM (it spans the hyperplane 
5oio = 5ioo) but most of the other 397 cones are mapped to lower-dimensional cones by the map g. The 
question how the number 398 grows as the length n increases will be addressed in Corollary 10. 
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3 Positivity and Morphisms in Tropical Geometry 

We have seen that a graphical model is the image of a polynomial map / from the space of parameters to 
the space of joint probability distributions on the observed random variables. Furthermore, we have seen 
that the tropicalization of / arises naturally in solving Problem 4. In this section we study the geometry of 
tropicalization in the more general setting where / : R'^' ^ R'" is an arbitrary polynomial map. In statistical 
applications, it is usually the case that each coordinate /o of the map / is a polynomial with positive coeffi- 
cients. If this holds then the polynomial map / is called positive. We say that / is surjectively positive if, in 
addition, / maps the positive orthant surjectively onto the positive points in the image, in symbols, 

/(R^o) = image(/) n R-o- (2) 

The set of all polynomial functions which vanish on the image of / is a prime ideal If in the polynomial 
ring K[pi ,p,n]- The closure of the image of / is the variety of the prime ideal If. 

In tropical geometry, we replace the variety of If by a piecewise-linear set as follows. The tropical 
variety T {If) is the set of all weight vectors w € R'" such that the initial ideal 1X1^(1 f) contains no monomial 
(T7\ [22\ . Following fill, we define the positive tropical variety 'T^{If) as the set of all weight vectors 
w € R™ such that the initial ideal in„ (/j) contains no polynomial with only positive coefficients. The tropical 
variety T (If) is a polyhedral fan in R™, and T^{If) is a polyhedral subcomplex of T {If). This means that 
T {If) is a finite union of closed convex polyhedral cones that fit together nicely, and T^{If) is the union of a 
subset of these cones. The tropicalization of the polynomial map / is the piecewise-linear map g : R'' — s- R"' 
defined by replacing products by sums and sums by minima in the evaluation of /. We say that g is a tropical 
morphism. Examples of tropical morphisms appear in the displayed formulas ©, Q, (fTOb and (fTTT l. 

The following theorem describes the geometry of this situation. We define the Newton polytope of a 
polynomial map / : R'^ ^ R™ as the Minkowski sum in R'' of the Newton polytopes of its coordinates 
/i ,...,/,„ . For basics on Newton polytopes and their normal fans see § 1 ] . 

Theorem 2. The tropical morphism g is linear on each cone in the normal fan of the Newton polytope off. 
Its image is a fan contained in T {If). Iff is positive then image(^) is a subset ofT^{If), but it is generally 
not a polyhedral subcomplex. If f is surjectively positive then image(g) = 'T^{If). 

Proof. Let P,- denote the Newton polytope of the polynomial f = f{si, . . . ,Sd). By definition, P,- is the 
convex hull in R'' of all non-negative lattice points a = (ai, . . . ,ad) G N"' such that the monomial s"^ • • -s^ 
appears with non-zero coefficient in /,. The piecewise-linear concave function gi is the support function of 
the polytope P,-. This means that gi{w) is the minimum value attained on P, by the linear functional a\-^w-a. 
In particular, the function gj : R'' — > R is linear on each cone in the normal fan of P,. 

The Newton polytope of the map / is the Minkowski sum Pi H hPm = {ai H ham '■ ^Pi}- The 

normal fan of Pi H hP,;, is the common refinement of the normal fans of Pi, . . . ,Pm. This shows that the 

function / = (/i , • • • ,/m) : R'' ^ R'^' is linear on each cone of the normal fan of the Newton polytope of /. 
Since g is continuous, the image of ^ is a closed polyhedral fan in R"'. 

Consider any vector w G R'^'. We must show that g{w) lies in T {If), and if / is positive then g{w) lies in 
T^{If). Let (|) be any polynomial in the ideal //. If we substitute pi = /i, . . . ,p„, = /,„ into (|) = (t)(pi, ... ,pm) 
then we get zero. Consequently, if we substitute the initial forms p\ = in„,(/i), . . . ,p,„ = ^'^w{fm) into the 
initial form in^jj,,,) ((|)) then the result is zero. See equation (11.2) on page 100 in E^ . This implies that 
^^g(w){^) is not a monomial. Moreover, if / is positive then (j) must have two terms whose coefficients have 
opposite signs. This implies the desired conclusion. 
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The following example shows that image(g) need not be a subcomplex of 'T^{If). If / is assumed to be 
surjectively positive, then it follows from 1^ Proposition 2.5] that image(g) = T + (/y ). □ 

Example 3. Let d = 'i,m = A and consider the Unear map 

/ : R? ^R"*, {S].,S2,Si,) ^ [si + S2 + Si, S\ +lS2 + Si,, S2 + s^, s^) . 

Then If is the principal ideal generated by the linear form p\— P2 + P3 — Pa, and T {If) is essentially the 
normal fan of a tetrahedron. We identify T {If) with the complete graph K^. The six edges of are labeled 
with six monomial-free initial ideals of If, namely, 

(Pi +P3), (-P2-P4), {p\ -Pi), {Pi -Pa), {-P2+P3), {P3-P4)- 

The first two of these six initial ideals contain a polynomial with positive coefficients. Hence the positive 
tropical variety T + (/y ) is the four-cycle in K4 formed by the remaining four edges. 
The tropicalization of the linear map / is the tropical morphism 

g : ^ R^, (mi,M2;"3) > {mm{ui,U2.,U3),mm{ui,U2.,uj,),min{u2,ui,),U3). (3) 

The image of g is the set of all vectors {a,a,b,c) with a < b <c. Each vector {a,a,b,c) with a < b < c 
has the initial ideal {pi — P2), so it lies on a particular edge of K^^. But the same edge also accounts for all 
vectors {a,a,b,c) with a < c < b, none of which is in the image of g. Thus image(g) is a closed segment 
which covers only half of the edge of K4 indexed by {p\— P2)- 

Here it is easy to replace / by a parameterization /' which is surjectively positive, for instance, 

/' : R'^^R'^, {si,S2,St,,Sa) ^ (^i +53,^1 +54,^2 +>S4, >S2 + >S3)- 

g' : R'* ^ R^, (mi,M2,M3,M4) i— > (min(Mi,M3), min(Mi,M4), min(M2,M4),min(M2,M3))- (4) 
We have If = If but now the tropical morphism g' maps onto the entire four-cycle 'T^{If). □ 

In the rest of this section we examine Theorem |2l for a small but important graphical model, namely, 
the naive Bayes model with two features (ill §7]. There ai^e two observed random variables Yi and Y2 
dependent on one hidden binary random variable X. The two observed variables take k and / possible values 
respectively. The parameterization / of this model is the map / : r2('''+') i— > R*^' given by 

Pij = SiotQj + Sjitij. 

Thus the model consists of all k x /-matrices P = {pij) of the form P = S T where 5 is a ^ x 2-matrix and 
r is a 2 X /-matrix, i.e., the model consists of precisely the k x /-matrices of rank < 2. 

Proposition 4. The parameterization f of the naive Bayes model with two features is surjectively positive. 
The ideal If is generated by the 3 x 3-subdeterminants of the k x l-matrix P = {pij). 

Proof. The map / being positive means that if P is any positive matrix of rank 2 then S and T can be chosen 
to be positive. This is a known result in linear algebra (see e.g. |0). The same statement is false for rank > 3, 
i.e., the parameterization of the naive Bayes model with three or more features is not surjectively positive. A 
well-known result in commutative algebra states that the (r + 1) x (r + l)-minors of a ^ x /-matrix generate 
a prime ideal. The variety of this ideal is the set ofkx /-matrices of rank < r. This our ideal If for r = 2. □ 
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The objects of Theorem |2l have been studied in |8l and O. The tropical variety T {If) is the set of 
k X /-matrices of tropical rank < 2, and the tropical variety 'T^{If) = image(g) is the set of ^ x /-matrices of 
Barvinok rank < 2. Develin [9:| determines the combinatorics and topology of these spaces when min(^, /) = 
3. He shows that T (If) is shellable but T + (/j) can have torsion in its integral homology groups. 

The Newton polytope of the map / is an interesting combinatorial object, namely, it is the {kl — k — l + 2)- 
dimensional zonotope associated with the complete bipartite graph Kj^j. The Newton polytope of each 
coordinate fij is a line segment, and the zonotope is their Minkowski sum. The normal fan is the hyperplane 
arrangement {m,o — "/i = v\j — VQj}. Its maximal cones correspond to the acyclic orientations of the complete 
bipartite graph Kf^j. West l25l showed that the number of facets of such a cone can be any integer between 
k + l — I and kl. The total number of cones equals lti5()t,/)(-l)'+'/!(/+ 1)', where S{k , i) is the Stirling 
number of the second kind. Here, the tropical morphism g is given by 

gij = mm{uio + voj,ua+vij). (5) 

The map g is piecewise-lineai- with respect to the hyperplane arrangement. Recent work of 

Federico Ardilla (in preparation) gives a complete classification of all fibers of g. 




Figure 2: The tropical variety and positive tropical variety of the 3 x 3-determinant. 
Example 5. Let k = 1 = 3, so the two observed random variables are ternary. The prime ideal is 

If = {PUP22P33 - PIIP23P32 " P\2P2\P33 + PnP23P3l + P\3P2\P32 " P\3P22P3\) ■ 
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The tropical variety T (If) is the fan over a two-dimensional polyhedral complex consisting of six triangles 
and nine quadrangles. This complex is the 2-skeleton of the product of two triangles, labeled as in Figure 
2a. This complex is shellable. The positive tropical variety T + (/j) is the subcomplex consisting of the nine 
quadrangles shown in Figure 2b. Note that T + (//) is a torus. 

The Newton polytope of / is a five-dimensional zonotope with 230 vertices, one for each acyclic orien- 
tation of the complete bipartite graph Ki, ^. The map g is linear on each of the 230 cones in the corresponding 
hyperplane arrangement, but it is rank-deficient on 68 of the cones. The remaining 162 = 18x9 cones are 
mapped onto the 9 quadrangles of the torus T + (/j). Thus the general fiber of g involves 18 cones. Of these, 
eight cones have 5 facets, eight cones have 6 facets, and two cones have 9 facets. □ 

4 Newton Polytopes of Graphical Models and their Complexity 

Consider a graphical model with E edges and n observed random variables Yi,...,Y„ each taking / values. 
Such a model is given by a positive polynomial map / : R'' — > R'". Each coordinate of / is a polynomial 
of degree e in the model parameters si, . . . ,5^. In this section we discuss the statistical meaning and the 
computational complexity of the mathematical objects introduced in the previous section. 

We write m,- = — log(5',) for the negative logarithms of the model parameters. Consider any of the /" 
possible observations a. The quantity /^(^i , . . . ,5^/) is the probability of making this particular observation, 
i.e. it is Prob(Y = a). The quantity ga{u\,. . . ,Ud) is the negative logarithm of the conditional probability 
Prob(X = h I Y = a) where h maximizes Prob(X = h | Y = a) for the parameters {si,. .. ,sj). Cleai^ly, the 
function go : R"^ — > R is piecewise-linear and concave on the logarithmic parameter space. 

The domains of linearity of the function g^ are the cones in the normal fan of the Newton polytope of 
/cj. Each maximal cone C is indexed by the hidden data h that maximizes Prob(X = h|Y = a) for any of the 
parameters {u\,. .. ,Ud) € C . The hidden data h which arise in this manner, for some choice of logarithmic 
parameters u, are called the possible explanations of the observation a. For instance, for the hidden Markov 
model of Section 2, the explanations are the Viterbi sequences. 

Let us now vary the observations. Each logarithmic parameter vector u defines an inference function 
a i-^' h from the set of observations to the set of explanations. For the HMM, each inference function 
{1, . . . ,/}" — > {1, . . . takes an observed sequence a to the corresponding Viterbi sequence h. There are 
{k"'Y' = k"'" such functions, but most of these are not inference functions. For instance, consider the binary 
HMM of length thr-ee. There are 8^ = 16,777,216 Boolean functions {0, 1}^ {0, 1}\ but, as we have 
seen at the end of Section 2, only 398 of these are inference functions for the HMM. 

Proposition 6. The inference functions o of a graphical model f are in bijection with the vertices of 
the Newton polytope of the map f. The explanations hfor a fixed observation o in a graphical model are in 
bijection with the vertices of the Newton polytope of the polynomial f^. 

In applications of graphical models, the number d of parameters and the number / of values of the 
observed random variables is small and fixed, but the number n of observed random variables is large. Recall 
that the model is the image of the map / : R"^' — > R'" . Hence the dimension of the model remains fixed but 
the dimension of its ambient space grows exponentially in n. It is therefore algorithmically infeasible to 
compute the full tropical variety 'T{If). What we can do efficiently, however, is to compute the Newton 
polytopes of the fa, or even the Newton polytope of /. This allows us to glean information about the 
tropical variety from the domains of linearity of its "coordinate functions" ga- 

Our next goal is to derive an upper bound on the number of vertices of the Newton polytopes. 
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Theorem 7. Consider graphical models f whose number of parameters d is fixed and whose number n of 
observed random variables and number of edges E varies. (Typically, E is a linear function ofn). Then the 
number of vertices of the Newton polytope NP{fa) of f^ is bounded above by 

#\eTtices{NP{fa)) < constant < constant 

For many important families of graphical models, the number E of edges is bounded by a linear function 
in terms of the number n of observed nodes, and in those cases we can replace E by n. Hence, for any given 
observation a, the number of explanations grows polynomially in n. For instance, in the hidden Mai^kov 
model of Section 2 we have E = 2n — I, and a similar relationship holds in the tree model of Section 5. 

Corollary 8. For any fixed observation in the homogeneous HMM, the number of explanations is at most 
Ckj • n^^^^^\ If all random variables are binary then the upper bound C ■ n^'^^^ holds. 

The proof of Theorem and Corollary [8l are derived from the following classical result on lattice poly- 
topes due to Andrews 01 ■ The necessary observation is that the Newton polytope of is contained in the 
cube [0,£']'^ and the volume of this cube equals E"^' . 

Proposition 9. (Andrews Q) For every fixed integer d there exists a constant C4 such that the number of 
vertices of any lattice polytope P in R'^' is bounded above by Ca ■ volume(P)^'^~^^/^'^~''^^. 

The Newton polytope of the map / was defined as the Minkowski sum of the I" smaller Newton poly- 
topes in Theorem From this we infer the following naive bound on its number of vertices. 

Corollary 10. The number of inference functions of a graphical model is at most P'^'iE'' hence this number 
scales at most singly exponentially in the complexity {n,E) of the graphical model. 

Consider the homogeneous HMM on binary random variables. Each inference function is a Boolean 
function {0, 1}" {0, 1}", but not conversely. The number of all Boolean functions is 2"^", which grows 
doubly exponentially in n. However, the number of inference functions is at most 2?°'^"''™^'^"). 

In practical applications of graphical models, it may be infeasible to compute all (singly-exponentially 
many) inference functions. Nonetheless, we believe that important insight can be gained by computing and 
classifying the Newton polytopes of graphical models / on few random variables. Such a study would be 
the polyhedral analogue to the algebraic classification of 1121 . 

On the other hand, for a fixed observation a, the size of the Newton polytope of /o grows polynomi- 
ally with the size of the graphical model, and therefore there is hope that the polytopes can be computed 
efficiently. Despite the fact that the Newton polytope of /o has polynomially many vertices in the size of 
the graphical model, the number of terms in /o grows exponentially. This is a potential problem because 
the computation of the Newton polytope requires inspecting these terms. The following result states that, 
in fact, the convex hull computations scales with the running time of the sum-product algorithm, which for 
many models of interest scales polynomially with the size of the graphical model. 

Proposition 11 (Polytope propagation). The Newton polytopes of the polynomials fc can be computed 
recursively using the decomposition of f^ according to the sum-product algorithm. 

Taken together. Theorem 7 and Proposition 1 1 say that polytope propagation is an efficient algorithm 
for parametric inference with graphical models. This statement is thesis (c) in our companion paper 
im. In that paper, the sum-product algorithm and the polytope propagation algorithm are explained and 
analyzed in more detail. We also demonstrate the practicality of our mathematical theory by explicitly 
computing (and statistically interpreting) various high-dimensional Newton polytopes for graphical models 
that arise in biological sequence analysis. 
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5 The General Markov Model on a Binary Tree 

We conclude by illustrating the concepts we have developed in the context of tree Markov models. These are 
directed graphical models where the graph is a directed tree x with observed random variables Yi,...,Y„ at 
the leaves. The naive Bayes model in Section 3 is the special case where n = 2. Each edge e has a different 
transition matrix S*^ = [i^^]. We consider the general model in Allman and Rhodes f^, which means that the 
5'° are arbitrary distinct / x /-matrices. In most applications, the transition matrices are from a special model 
family (e.g. in phylogenetics these may be Jukes-Cantor model or the Hasegawa-Kishino-Yano model). As 
before, we relax the hypothesis that transition probabilities ai^e non-negative and sum to 1 . Hence the are 
distinct unknowns. For simplicity we shall further assume that the tree x is binary. 

Proposition 12. The general Markov model for the binary tree x is the image of a map 
where each coordinate of f is a multilinear polynomial in the unknowns {(^^y)- ^ edge o/x}. 

If we denote an edge between nodes / and j by {ij) and x' is the tree x without the leaves, then the 
coordinate of the multilinear map / indexed by an observed sequence {g\ , . . . , a„) can be written as follows: 



Par 



I n ' 

h iez' 

with children j.k 



IK) \ 

ihl' 



(6) 



Here h ranges over all colorations h = {hi)i^i of the nodes such that hj = Oj for all leaves / Our running 
example in this section is the binary tree in Figure|3lwith binary random variables (Z = 2). 




12 3 4 

Figure 3: A directed binary tree with n = 4 leaves. 
In this example, the coordinates of the multilinear map / : R^^ — > R^^ are given by the formula 

^.(75) ..{76)^.^^(51) _{52). 

{/!5,ft6,/n}e{0,i}3 

The prime ideal If of polynomial invariants is generated by the 3 x 3-subdeterminants of the matrix 
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PnnJ 



(8) 
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Thus this particular model is the k = 1 = 4 instance of the determinantal variety in Proposition |3 

We generalize the determinantal presentation in this example by proposing the following explicit solu- 
tion to Problem 5 for arbitrary binary trees x. Every edge of x induces a split of the set of leaves {1,2, ... ,n}, 
corresponding to the two connected components of the tree obtained by removing that edge. The unrooted 
tree underlying x is uniquely determined by the set of these splits. 

Conjecture 13. The ideal If of phylogenetic invariants of the general Markov model for any binary tree X on 
binary random variables is generated by the 3 x ^-determinants of all two-dimensional matrices obtained 
by flattening the 2 x ■ • • x 2-table (/?ai - a„) according to the splits induced by the edges ofx. 

We need to explain the meaning of the word "flattening". If {A,B) is any spUt of the set {1, . . . ,«} then 
this refers to the 2*^'*) x 2*'^^ -matrix whose rows and columns are indexed by functions A ^ {0, 1} and 
S ^ {0, 1} respectively, and whose entries are the 2" probabilities Poi -o,,- 

In December 2003, AUman and Rhodes announced a proof of the set-theoretic version of our Conjecture 
fT3] What this means algebraically is that // equals the radical of the ideal generated by the aforementioned 
3 X 3 -determinants. In light of this progress, we wish to offer also the following tropical version of Con- 
jecture ^1 It would be very nice to show that Proposition |4]extends to this situation. However, none of the 
remaining discussion in this section depends on these conjectures. 

Conjecture 14. The map f is surjectively positive for 1 = 2. The tropical variety ( resp. the positive tropical 
variety) of the prime ideal If coincides with the set of all 2 x 2 x ■ ■ ■ x 2-tables {uai-a„) whose flattenings 
along the splits of the tree x have tropical rank (resp. Barvinok rank) at most 2. 

The sum-product algorithm is used in practice to evaluate the polynomial Its running time is linear 
in n, despite the fact that the number of terms in ^ grows exponentially. This reduction in complexity 
is achieved by recursively grouping subsums. For instance, Q becomes 

_ V (51) (52) (75) (51) (52)x ^ / (76) (63) (64) (76) (63) (64) x g 

v=0 

The rule to remember is this: Polynomials are evaluated recursively as sums of products of smaller polyno- 
mials. This is the solution to Problem 1. For details on the tree case see |fTo!|. 

Problem 2 is known in phylogeny as the joint ancestral reconstruction problem, which asks for the 
maximum a posteriori ancestral assignments hi given the observations (ai , . . . ,a„) at the leaves. An efficient 
method for solving this problem appears in lIT^ . This method is nothing but the sum-product algorithm with 
ordinary arithmetic (+, x) replaced by tropical arithmetic (min,+). The a-coordinate of the tropicalization 
g : r(2«-2)'' ^ r/" of the map ^ is 

8o,...o,. = min £ (v£i+v£l), (10) 

/tt' 
withchildren /./; 

This expression can be evaluated efficiently by the same scheme as before. The rule now is this: Piecewise- 
linear concave functions are evaluated recursively as minima of sums of smaller such functions. A simple 
example illustrating this rule is the tropicalization of 

801020304 = min (mvo,c9 + i'va3a4) (11) 
ve{o,i} 

where u^^^oj = min(v^Q^' + vjj^j^^ + vjjfj, v^^j^' + vfj^^ + v|^^^^) and similai'ly for Mva304- 
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We saw in Section 4 that the number of vertices of the Newton polytopes of the coordinate polynomials 
/o is critical for efficient parametric inference. That number grows polynomially in n if the number of 
parameters is fixed (thanks to TheoremEJ but it may grow exponentially if the number of parameters is not 
bounded. For the general Markov model on a tree x, the growth will be exponential unless we restrict the 
number of parameters. This can be done, for instance, by considering the homogeneous tree model where 
the transition matrices along all edges ai^e identical: 

~ is independent of the edge e. 

Using Theorem we obtain the following result analogous to Corollary [HI 

Proposition 15. The number of vertices of the Newton polytope of any coordinate fc in the homogeneous 

/-— 1 

tree model is bounded above by n times a constant depending only on I. 

For tree models which are used in applications, such as phylogenetics, the number of parameters is 
likely to be reduced even further. In such cases, the parametric joint ancestral reconstruction problem can 
be solved efficiently using the polytope propagation algorithm techniques in Proposition fTTl 

6 Summary: A Statistics - Geometry Dictionary 

The algebraic representation for graphical models with hidden variables leads naturally to an interpretation 
of a parameterized model as a point on an algebraic variety. Marginal probabilities are coordinates of 
points on the variety. Varieties can be tropicalized, and the statistical meaning is that the MAP probabilities 
(calculated with logarithms of the parameters) can be interpreted as coordinates of points on the positive 
part of the tropical variety. Hence, the tropical model is fundamental for understanding MAP probabilities. 
Although we have not addressed it in this paper, the logarithms of the marginal probabilities are coordinates 
of points on the amoeba l23l of the model. Amoebas are likely to be important for understanding the 
geometry of maximum likelihood estimation. 

The sum-product algorithm for graphical models is an efficient method for evaluating the coordinate 
polynomials of a graphical model. This algorithm works in exactly the same way for classical arithmetic 
(+, X ) and for tropical arithmetic (+, min). This means that the same method is used to evaluate coordinates 
of points on the variety and of points on the tropical variety. 

An explanation for an observation a is a vertex of the Newton polytope of /o. Thus, the parametric infer- 
ence problem is solved by finding the normal fans of the Newton polytopes of the coordinate polynomials. 
For many important applications, the number of vertices of the polytopes is polynomial in the size of the 
graphical model. The polytope propagation algorithm, which is a geometric analog of the sum-product al- 
gorithm, finds the Newton polytopes, and is efficient when the sum-product algorithm is fast and the number 
of vertices on the Newton polytopes is small. 

An inference function for a graphical model is a function from the set of observations to the set of 
explanations which maximizes the a posteriori probabilities with respect to some choice of pai^ameters. 
Inference functions con^espond to vertices of the Newton polytope of the map /. This polytope is much 
larger than the Newton polytope of a single coordinate /o, so it can only be computed for small graphical 
models, but it has the advantage that it encodes the entire piecewise-linear geometry of the model. 

In a companion paper fTSll . we show that polytope propagation is practical and useful in the important 
application of biological sequence analysis. In particular, existing parametric alignment methods Illlll3[l24ll 
can be viewed as special cases of parametric inference for pair hidden Markov models. The computation of 
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the Newton polytopes is also useful for Bayesian computations, where we have priors on the parameters and 
it is of interest to integrate over the maximal cones in the normal fan of the Newton polytope fTSl §5]. 
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